While waiting for Star Wars: The Force Awakens to come out, the team at FiveThirtyEight became interested in answering some questions about Star Wars fans. In particular, they wondered: does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?

The team needed to collect data addressing this question. To do this, they surveyed Star Wars fans using the online tool SurveyMonkey. They received 835 total responses, which you download from their GitHub repository.

For this project, you'll be cleaning and exploring the data set in Jupyter notebook. To see a sample notebook containing all of the answers, visit the project's GitHub repository.



In [42]:

    
import pandas as pd
star_wars = pd.read_csv('star_wars.csv', encoding='ISO=8859-1')

We need to specify an encoding because the data set has some characters that aren't in Python's default utf-8 encoding. You can read more about character encodings on developer Joel Spolsky's blog.



In [43]:

    
star_wars.head(10)









    Out[43]:






  
    
      
      RespondentID
      Have you seen any of the 6 films in the Star Wars franchise?
      Do you consider yourself to be a fan of the Star Wars film franchise?
      Which of the following Star Wars films have you seen? Please select all that apply.
      Unnamed: 4
      Unnamed: 5
      Unnamed: 6
      Unnamed: 7
      Unnamed: 8
      Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.
      ...
      Unnamed: 28
      Which character shot first?
      Are you familiar with the Expanded Universe?
      Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦
      Do you consider yourself to be a fan of the Star Trek franchise?
      Gender
      Age
      Household Income
      Education
      Location (Census Region)
    
  
  
    
      0
      NaN
      Response
      Response
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      Star Wars: Episode I  The Phantom Menace
      ...
      Yoda
      Response
      Response
      Response
      Response
      Response
      Response
      Response
      Response
      Response
    
    
      1
      3.292880e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      3
      ...
      Very favorably
      I don't understand this question
      Yes
      No
      No
      Male
      18-29
      NaN
      High school degree
      South Atlantic
    
    
      2
      3.292880e+09
      No
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      Yes
      Male
      18-29
      $0 - $24,999
      Bachelor degree
      West South Central
    
    
      3
      3.292765e+09
      Yes
      No
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      NaN
      NaN
      NaN
      1
      ...
      Unfamiliar (N/A)
      I don't understand this question
      No
      NaN
      No
      Male
      18-29
      $0 - $24,999
      High school degree
      West North Central
    
    
      4
      3.292763e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      5
      ...
      Very favorably
      I don't understand this question
      No
      NaN
      Yes
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
    
      5
      3.292731e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      5
      ...
      Somewhat favorably
      Greedo
      Yes
      No
      No
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
    
      6
      3.292719e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      1
      ...
      Very favorably
      Han
      Yes
      No
      Yes
      Male
      18-29
      $25,000 - $49,999
      Bachelor degree
      Middle Atlantic
    
    
      7
      3.292685e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      6
      ...
      Very favorably
      Han
      Yes
      No
      No
      Male
      18-29
      NaN
      High school degree
      East North Central
    
    
      8
      3.292664e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      4
      ...
      Very favorably
      Han
      No
      NaN
      Yes
      Male
      18-29
      NaN
      High school degree
      South Atlantic
    
    
      9
      3.292654e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      5
      ...
      Somewhat favorably
      Han
      No
      NaN
      No
      Male
      18-29
      $0 - $24,999
      Some college or Associate degree
      South Atlantic
    
  

10 rows × 38 columns



In [44]:

    
star_wars.columns









    Out[44]:





Index(['RespondentID',
       'Have you seen any of the 6 films in the Star Wars franchise?',
       'Do you consider yourself to be a fan of the Star Wars film franchise?',
       'Which of the following Star Wars films have you seen? Please select all that apply.',
       'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14',
       'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
       'Unnamed: 28', 'Which character shot first?',
       'Are you familiar with the Expanded Universe?',
       'Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦',
       'Do you consider yourself to be a fan of the Star Trek franchise?',
       'Gender', 'Age', 'Household Income', 'Education',
       'Location (Census Region)'],
      dtype='object')



In [45]:

    
star_wars = star_wars[pd.notnull(star_wars['RespondentID'])]
star_wars.head()









    Out[45]:






  
    
      
      RespondentID
      Have you seen any of the 6 films in the Star Wars franchise?
      Do you consider yourself to be a fan of the Star Wars film franchise?
      Which of the following Star Wars films have you seen? Please select all that apply.
      Unnamed: 4
      Unnamed: 5
      Unnamed: 6
      Unnamed: 7
      Unnamed: 8
      Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.
      ...
      Unnamed: 28
      Which character shot first?
      Are you familiar with the Expanded Universe?
      Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦
      Do you consider yourself to be a fan of the Star Trek franchise?
      Gender
      Age
      Household Income
      Education
      Location (Census Region)
    
  
  
    
      1
      3.292880e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      3
      ...
      Very favorably
      I don't understand this question
      Yes
      No
      No
      Male
      18-29
      NaN
      High school degree
      South Atlantic
    
    
      2
      3.292880e+09
      No
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      Yes
      Male
      18-29
      $0 - $24,999
      Bachelor degree
      West South Central
    
    
      3
      3.292765e+09
      Yes
      No
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      NaN
      NaN
      NaN
      1
      ...
      Unfamiliar (N/A)
      I don't understand this question
      No
      NaN
      No
      Male
      18-29
      $0 - $24,999
      High school degree
      West North Central
    
    
      4
      3.292763e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      5
      ...
      Very favorably
      I don't understand this question
      No
      NaN
      Yes
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
    
      5
      3.292731e+09
      Yes
      Yes
      Star Wars: Episode I  The Phantom Menace
      Star Wars: Episode II  Attack of the Clones
      Star Wars: Episode III  Revenge of the Sith
      Star Wars: Episode IV  A New Hope
      Star Wars: Episode V The Empire Strikes Back
      Star Wars: Episode VI Return of the Jedi
      5
      ...
      Somewhat favorably
      Greedo
      Yes
      No
      No
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
  

5 rows × 38 columns

Some columns are currently string types, because the main values they contain are Yes and No. We can make the data a bit easier to analyze down the road by converting each column to a Boolean having only the values True, False, and NaN



In [46]:

    
bool_type = {
    'Yes': True,
    'No': False
}
star_wars['Have you seen any of the 6 films in the Star Wars franchise?'] = star_wars['Have you seen any of the 6 films in the Star Wars franchise?'].map(bool_type)
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'] = star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].map(bool_type)



In [47]:

    
star_wars['Do you consider yourself to be a fan of the Star Wars film franchise?'].head()









    Out[47]:





1     True
2      NaN
3    False
4     True
5     True
Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: object

Change column name and bool values



In [48]:

    
import numpy as np

bool_type1 = {
    "Star Wars: Episode I  The Phantom Menace": True,
    np.nan: False,
    "Star Wars: Episode II  Attack of the Clones": True,
    "Star Wars: Episode III  Revenge of the Sith": True,
    "Star Wars: Episode IV  A New Hope": True,
    "Star Wars: Episode V The Empire Strikes Back": True,
    "Star Wars: Episode VI Return of the Jedi": True
}

for col in star_wars.columns[3:9]:
    star_wars[col] = star_wars[col].map(bool_type1)
    
star_wars = star_wars.rename(columns={
    'Star Wars: Episode I The Phantom Menace': "seen_1",
    'Unnamed: 4': 'seen_2',
    'Unnamed: 5': 'seen_3',
    'Unnamed: 6': 'seen_4',
    'Unnamed: 7': 'seen_5',
    'Unnamed: 8': 'seen_6'
})



In [49]:

    
star_wars.head()









    Out[49]:






  
    
      
      RespondentID
      Have you seen any of the 6 films in the Star Wars franchise?
      Do you consider yourself to be a fan of the Star Wars film franchise?
      Which of the following Star Wars films have you seen? Please select all that apply.
      seen_2
      seen_3
      seen_4
      seen_5
      seen_6
      Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.
      ...
      Unnamed: 28
      Which character shot first?
      Are you familiar with the Expanded Universe?
      Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦
      Do you consider yourself to be a fan of the Star Trek franchise?
      Gender
      Age
      Household Income
      Education
      Location (Census Region)
    
  
  
    
      1
      3.292880e+09
      True
      True
      True
      True
      True
      True
      True
      True
      3
      ...
      Very favorably
      I don't understand this question
      Yes
      No
      No
      Male
      18-29
      NaN
      High school degree
      South Atlantic
    
    
      2
      3.292880e+09
      False
      NaN
      False
      False
      False
      False
      False
      False
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      Yes
      Male
      18-29
      $0 - $24,999
      Bachelor degree
      West South Central
    
    
      3
      3.292765e+09
      True
      False
      True
      True
      True
      False
      False
      False
      1
      ...
      Unfamiliar (N/A)
      I don't understand this question
      No
      NaN
      No
      Male
      18-29
      $0 - $24,999
      High school degree
      West North Central
    
    
      4
      3.292763e+09
      True
      True
      True
      True
      True
      True
      True
      True
      5
      ...
      Very favorably
      I don't understand this question
      No
      NaN
      Yes
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
    
      5
      3.292731e+09
      True
      True
      True
      True
      True
      True
      True
      True
      5
      ...
      Somewhat favorably
      Greedo
      Yes
      No
      No
      Male
      18-29
      $100,000 - $149,999
      Some college or Associate degree
      West North Central
    
  

5 rows × 38 columns



In [50]:

    
star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)
star_wars.columns[9:15]









    Out[50]:





Index(['Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14'],
      dtype='object')



In [51]:

    
star_wars = star_wars.rename(columns={
    'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': "ranking_1",
    'Unnamed: 10': 'ranking_2',
    'Unnamed: 11': 'ranking_3',
    'Unnamed: 12': 'ranking_4',
    'Unnamed: 13': 'ranking_5',
    'Unnamed: 14': 'ranking_6'
})



In [52]:

    
star_wars.columns[9:15]









    Out[52]:





Index(['ranking_1', 'ranking_2', 'ranking_3', 'ranking_4', 'ranking_5',
       'ranking_6'],
      dtype='object')

Now that we've cleaned up the ranking columns, we can find the highest-ranked movie more quickly



In [53]:

    
%matplotlib inline
import matplotlib.pyplot as plt
plt.bar(range(6), star_wars[star_wars.columns[9:15]].mean())









    Out[53]:





<Container object of 6 artists>

The 5th movies (Episode V The Empire Strikes Back) has a highest rating (in this survey 1=best, 6=worst). "Episode III Revenge of the Sith" has a worst rate.



In [54]:

    
plt.bar(range(6), star_wars[star_wars.columns[3:9]].sum())









    Out[54]:





<Container object of 6 artists>

We can figure out how many people have seen each movie just by taking the sum of the column. Earliest movies is more popular - this corresponds to ranking above (earlier have better ranking).



In [55]:

    
males = star_wars[star_wars["Gender"] == "Male"]
females = star_wars[star_wars["Gender"] == "Female"]



In [56]:

    
## Redo the two previous analyses (find the most viewed movie and the highest-ranked movie) separately for each group



In [59]:

    
## find highest-ranked movie (lower is better)
plt.bar(range(6), females[females.columns[9:15]].mean())
plt.show()
plt.bar(range(6), males[males.columns[9:15]].mean())
plt.show()



In [60]:

    
## find most viewed movie (higher is better)
plt.bar(range(6), females[females.columns[3:9]].mean())
plt.show()
plt.bar(range(6), males[males.columns[3:9]].mean())
plt.show()

More males watch all episods but rate high only the earliest movies. Instead, less females watch new episode but rate this new ones better

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	Which of the following Star Wars films have you seen? Please select all that apply.	Unnamed: 4	Unnamed: 5	Unnamed: 6	Unnamed: 7	Unnamed: 8	Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
0	NaN	Response	Response	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	Star Wars: Episode I The Phantom Menace	...	Yoda	Response	Response	Response	Response	Response	Response	Response	Response	Response
1	3.292880e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	3	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	No	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	Yes	No	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	NaN	NaN	NaN	1	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
6	3.292719e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	1	...	Very favorably	Han	Yes	No	Yes	Male	18-29	$25,000 - $49,999	Bachelor degree	Middle Atlantic
7	3.292685e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	6	...	Very favorably	Han	Yes	No	No	Male	18-29	NaN	High school degree	East North Central
8	3.292664e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	4	...	Very favorably	Han	No	NaN	Yes	Male	18-29	NaN	High school degree	South Atlantic
9	3.292654e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Somewhat favorably	Han	No	NaN	No	Male	18-29	$0 - $24,999	Some college or Associate degree	South Atlantic

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	Which of the following Star Wars films have you seen? Please select all that apply.	seen_2	seen_3	seen_4	seen_5	seen_6	Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?ÂÃ¦	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
1	3.292880e+09	True	True	True	True	True	True	True	True	3	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	False	NaN	False	False	False	False	False	False	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	True	False	True	True	True	False	False	False	1	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	True	True	True	True	True	True	True	True	5	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	True	True	True	True	True	True	True	True	5	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central